QurAna: Corpus of the Quran annotated with Pronominal Anaphora

نویسندگان

  • Abdul-Baquee M. Sharaf
  • Eric Atwell
چکیده

This paper presents QurAna: a large corpus created from the original Quranic text, where personal pronouns are tagged with their antecedence. These antecedents are maintained as an ontological list of concepts, which has proved helpful for information retrieval tasks. QurAna is characterized by: (a) comparatively large number of pronouns tagged with antecedent information (over 24,500 pronouns), and (b) maintenance of an ontological concept list out of these antecedents. We have shown useful applications of this corpus. This corpus is the first of its kind covering Classical Arabic text, and could be used for interesting applications for Modern Standard Arabic as well. This corpus will enable researchers to obtain empirical patterns and rules to build new anaphora resolution approaches. Also, this corpus can be used to train, optimize and evaluate existing approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Approach to Pronominal Anaphora Resolution in Arabic

Corresponding Author: Abdullatif Abolohom Department of Computer Science, Faculty of Information Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia Email: [email protected] Abstract: One of the challenges in natural language processing is to determine which pronouns to be referred to their intended referents in the discourse. Performing anaphora resolution ...

متن کامل

Zero Pronominal Anaphora Resolution for the Romanian Language

This paper presents a new study on the distribution, identification, and resolution of zero pronouns in Romanian. A Romanian corpus, including legal, encyclopaedic, literary, and news texts has been created and manually annotated for zero pronouns. Using a morphological parser for Romanian and machine learning methods, experiments were performed on the created corpus for the identification and ...

متن کامل

The DAD Parallel Corpora and their Uses

This paper deals with the uses of the annotations of third person singular neuter pronouns in the DAD parallel and comparable corpora of Danish and Italian texts and spoken data. The annotations contain information about the functions of these pronouns and their uses as abstract anaphora. Abstract anaphora have constructions such as verbal phrases, clauses and discourse segments as antecedents ...

متن کامل

Where Anaphora and Coreference Meet. Annotation in the Spanish CESS-ECE Corpus

This paper describes the guidelines of the annotation scheme designed to enrich the Spanish CESS-ECE corpus with coreference information, which is a significant step towards the definition of an exhaustive typology of pronominal and full NP coreferential expressions and their relations for Spanish. The goal is twofold. From a computational perspective, this work establishes the formal foundatio...

متن کامل

Pronominal Reference Type Identification and Event Anaphora Resolution for Hindi

In this paper, we present hybrid approaches for pronominal reference type (abstract or concrete) identification and event anaphora resolution for Hindi. Pronominal reference type identification is one of the important parts for any anaphora resolution system as it helps anaphora resolver in optimal feature selection based on pronominal reference types. We use language specific rules and feature...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012